Thanks to huge efforts by citizen scientists, a majority of species observations in the Atlas of Living Australia are collected opportunistically, where people record observations incidentally rather than through a recurring monitoring program.
However, whether an observation is recorded or not doesn’t just depend on the species. It might be rainy, it might be too hot, an area might be inaccessible; all of these factors can affect whether people make an observation.
The COVID-19 pandemic had a major impact on people’s health, behaviour and travel. In Australia, several lockdowns over 2020-2021 imposed restrictions on people’s movements, limiting them to either stay at home or choose activities (mainly exercise) that they could do in their local area. Melbourne experienced the longest continuous lockdown in the world.
To what extent did COVID-19 and lockdowns affect the number of species observations people made over that time? Here, we’ll use a highlighted time-series plot to investigate how lockdowns affected the data collection of Anatidae observations (ducks, geese and swans), a group frequently recorded on walks and outdoor gatherings, in Melbourne compared to previous years.
Get data
We’ll start by downloading Anatidae records.
Left: Tadorna (Casarca) tadornoides (Tracey Hinton CC-BY-NC 4.0 (Int)) Middle: Cygnus (Chenopis) atratus (jpshahady CC-BY-NC 4.0 (Int)) Right: Spatula rhynchotis (Annette Green CC-BY-NC 4.0 (Int))
First, let’s load some packages:
# Load packages
library(galah)
library(tidyverse)
library(grid)
library(pilot) # remotes::install_github("olihawkins/pilot")
library(ggtext)
library(showtext)Let’s use the {galah} package to download duck records in Melbourne from 2017-2021 to get records from years before and during COVID-19 lockdowns.
Searching with galah::search_fields() shows us that {galah} contains Greater Capital City Statistical Areas, which we can use to filter our query.
search_all(fields, "city")# A tibble: 1 × 4
id description type link
<chr> <chr> <chr> <chr>
1 cl10929 PSMA ABS Greater Capital City Statistical Areas (2016) AB… laye… http…
search_all(fields, "city") |> search_values("melbourne")• Showing values for 'cl10929'.
# A tibble: 1 × 2
field category
<chr> <chr>
1 cl10929 GREATER MELBOURNE
Let’s build our query to return Anatidae records from GREATER MELBOURNE and use galah_select() to return only the eventDate column.
You will need to first provide a registered email with the ALA using galah_config() before retrieving records.
# Add registered email (register at ala.org.au)
galah_config(email = "your-email@email.com")ducks <-
galah_call() |>
galah_identify("Anatidae") |>
galah_filter(
cl10929 == "GREATER MELBOURNE",
year >= 2017,
year <= 2021,
basisOfRecord == "HUMAN_OBSERVATION"
) |>
galah_select(eventDate) |>
atlas_occurrences()
ducks |> head(6L)# A tibble: 6 × 1
eventDate
<dttm>
1 2016-12-31 13:00:00
2 2016-12-31 13:00:00
3 2016-12-31 13:00:00
4 2016-12-31 13:00:00
5 2016-12-31 13:00:00
6 2016-12-31 13:00:00
We’ll pull out some week and year of each date and count the total occurrence records for each week.
ducks_weekly <- ducks |>
mutate(date = as_date(eventDate),
year = year(eventDate),
week = week(eventDate)) |>
filter(year > 2016) |> # remove stray 2016 records
group_by(year, week) |>
summarise(week_count = n())
ducks_weekly # A tibble: 265 × 3
# Groups: year [5]
year week week_count
<dbl> <dbl> <int>
1 2017 1 647
2 2017 2 670
3 2017 3 665
4 2017 4 790
5 2017 5 745
6 2017 6 580
7 2017 7 658
8 2017 8 680
9 2017 9 575
10 2017 10 541
# … with 255 more rows
We want to compare record counts in 2020-2021 to previous years, but because records added the ALA have increased each year, comparing raw numbers will be an unequal comparison and bias our results.
To avoid this, let’s scale our record counts by the total number of Anatidae records each year so that we compare proportions rather than raw numbers.
First let’s download the total Anatidae records for each year.
ducks_yearly <-
galah_call() |>
galah_identify("Anatidae") |>
galah_filter(cl10929 == "GREATER MELBOURNE",
year >= 2017, year <= 2021) |>
galah_group_by(year) |>
atlas_counts() |>
rename(year_records = count) |>
mutate(year = as.numeric(year)) |>
arrange(-desc(year))
ducks_yearly# A tibble: 5 × 2
year year_records
<dbl> <int>
1 2017 37194
2 2018 48696
3 2019 47874
4 2020 48850
5 2021 57520
Now we’ll split ducks_weekly into 5 data.frames for each year and recursively divide each weekly record count by the total records for that year with map2(). At the end we’ll bind the separate data.frames into one again with bind_rows().
ducks_scaled <- ducks_weekly %>%
split(.$year) %>%
map2(.x = .,
.y = ducks_yearly$year_records,
~ .x %>%
mutate(prop = .x$week_count / .y) %>%
select(-week_count)
) %>%
bind_rows()
ducks_scaled# A tibble: 265 × 3
# Groups: year [5]
year week prop
<dbl> <dbl> <dbl>
1 2017 1 0.0174
2 2017 2 0.0180
3 2017 3 0.0179
4 2017 4 0.0212
5 2017 5 0.0200
6 2017 6 0.0156
7 2017 7 0.0177
8 2017 8 0.0183
9 2017 9 0.0155
10 2017 10 0.0145
# … with 255 more rows
To compare the “normal” number of records collected prior to COVID-19 to the numbers collected throughout 2020 and 2021, it would be nice to simplify these numbers into 2 lines: one with the the average weekly “normal” count, and one with the weekly count over 2020 and 2021.
To create the average weekly “normal” line, let’s calculate the mean proportion of records each week from 2017-2019.
To do this, we’ll place our weekly proportions in separate columns using pivot_wider() and calculate the mean proportion of observations each week across year_2017, year_2018 and year_2019 columns.
ducks_scaled <- ducks_scaled %>%
pivot_wider(names_from = year,
values_from = prop,
names_sort = TRUE,
names_glue = "year_{year}") |>
rowwise() |>
mutate(
mean_2017_19 = mean(c_across(year_2017:year_2019))
) |>
select(-year_2017, -year_2018, -year_2019)
ducks_scaled# A tibble: 53 × 4
# Rowwise:
week year_2020 year_2021 mean_2017_19
<dbl> <dbl> <dbl> <dbl>
1 1 0.0283 0.0218 0.0223
2 2 0.0191 0.0186 0.0221
3 3 0.0251 0.0202 0.0200
4 4 0.0243 0.0175 0.0199
5 5 0.0159 0.0203 0.0183
6 6 0.0175 0.0205 0.0156
7 7 0.0147 0.0157 0.0180
8 8 0.0154 0.0190 0.0173
9 9 0.0128 0.0184 0.0144
10 10 0.0117 0.0167 0.0161
# … with 43 more rows
We have all the numbers we need for plotting, but just need to reorganise them so they plot correctly.
We have two separate columns containing 2020 and 2021 proportional counts, weeks 1-53 (because 2021 didn’t start exactly when the week ended). To plot these counts as one continuous line from Jan 2020 to Dec 2021 (106 weeks total), we’ll need to place our 2021 counts under our 2020 counts in the same column and convert the week numbers for 2021 to 54-106.
# 2021 record count proportions
ducks_scaled_2021 <- ducks_scaled |>
select(-year_2020) |>
rename(prop = year_2021) |>
mutate(week = 53 + week)
# 2020 + 2021 record count proportions
ducks_final <- ducks_scaled |>
select(-year_2021) |>
rename(prop = year_2020) |>
bind_rows(ducks_scaled_2021)
ducks_final# A tibble: 106 × 3
# Rowwise:
week prop mean_2017_19
<dbl> <dbl> <dbl>
1 1 0.0283 0.0223
2 2 0.0191 0.0221
3 3 0.0251 0.0200
4 4 0.0243 0.0199
5 5 0.0159 0.0183
6 6 0.0175 0.0156
7 7 0.0147 0.0180
8 8 0.0154 0.0173
9 9 0.0128 0.0144
10 10 0.0117 0.0161
# … with 96 more rows
Lockdowns
During the height of the pandemic, Melbourne had 6 distinct lockdowns. Let’s add their start and end dates to a tibble.
Because we want to plot 2020 and 2021 on the same plot, we’ll use ifelse() to make sure our week numbers in 2021 match our week numbers in ducks_final.
n_lockdown <- c(1:6)
start_date <- c("2020-03-31", "2020-07-09",
"2021-02-13", "2021-05-28",
"2021-07-16", "2021-08-05")
end_date <- c("2020-05-12", "2020-10-27",
"2021-02-17", "2021-06-10",
"2021-07-27", "2021-10-21")
lockdowns <- tibble(n_lockdown, start_date, end_date) |>
mutate(
n_days = as_date(ymd(end_date)) - as_date(ymd(start_date)),
week_start = ifelse(year(start_date) == 2020,
week(start_date), week(start_date) + 53),
week_end = ifelse(year(end_date) == 2020,
week(end_date), week(end_date) + 53),
)
lockdowns # A tibble: 6 × 6
n_lockdown start_date end_date n_days week_start week_end
<int> <chr> <chr> <drtn> <dbl> <dbl>
1 1 2020-03-31 2020-05-12 42 days 13 19
2 2 2020-07-09 2020-10-27 110 days 28 43
3 3 2021-02-13 2021-02-17 4 days 60 60
4 4 2021-05-28 2021-06-10 13 days 75 76
5 5 2021-07-16 2021-07-27 11 days 82 83
6 6 2021-08-05 2021-10-21 77 days 84 95
Make plot
To help us see the components of our final plot more clearly, let’s construct our visualisation step-by-step.
First, we’ll add our lockdown dates as highlighted rectangular blocks. To do this we can use geom_rect(), setting the xmin and xmax values to our week_start and week_end columns in lockdowns. We’ll make the rectangle spread across the entire plot by setting ymax = Inf and ymin = 0.
We’ll also set our fill inside of aes() and define its value within scale_fill_manual() which will allow us to add the lockdown colour and label to its own legend.
p1 <- ggplot() +
geom_rect(data = lockdowns,
aes(xmin = week_start,
xmax = week_end,
fill = "Lockdown"),
ymin = 0,
ymax = Inf,
alpha=0.2) +
scale_fill_manual(
values = c("Lockdown" = pilot_color("yellow")))
p1Next we’ll add our proportional species observation counts as lines. We can define their colours and edit the legend and axis labels, too.
p2 <- p1 +
# add lines
geom_line(data = ducks_final,
aes(x = week, y = prop,
color = "2020-21 Records"),
linewidth = 1) +
geom_line(data = ducks_final,
aes(x = week, y = mean_2017_19,
color = "2017-19 Average"),
linetype = "twodash",
linewidth = 0.8) +
# add fill
geom_area(data = ducks_final,
aes(x = week, y = prop),
fill=pilot_color("blue"),
alpha=0.3) +
scale_color_manual(values = c(pilot_color("orange"),
pilot_color("blue")),
labels = c("2017-19 (average)",
"2020-21")) +
guides(fill = guide_legend(title = ""),
color = guide_legend(title = "Year")) +
labs(y = "Proportion of ducks recorded",
x = "Month",
title = "Duck, geese & swan observations in Melbourne (2020-21 vs. 2017-19)",
subtitle = "Weekly ALA records during and prior to COVID-19",
caption = "Weekly proportions scaled by total records for that year")
p2The plot above is enough to see everything we need to see from our data. You could stop here if you wished, but we’ve made a more refined visualisation with nice fonts, axis scales, axis labels and titles:
Code
# add fonts
font_add_google("Montserrat", family = "mont")
font_add_google("Hind", family = "hind")
showtext_auto(enable = TRUE)
p2 +
# make axis scales understandable
scale_y_continuous(expand = c(0, 0),
limits = c(0, 0.035),
labels = scales::percent_format()) +
scale_x_continuous(expand = c(0, 0),
limits = c(1, 106),
breaks = c(1, 14, 27, 40, 53, 66, 79, 92),
labels = c("Jan", "Apr", "Jul", "Oct", "Jan", "Apr", "Jul", "Oct")) +
# add year x axis labels
coord_cartesian(clip = "off") +
annotate_pilot(label = "2020", x = 27, y = 0,
alpha = 0.7, vjust = 3.8,size = 10) +
annotate_pilot(label = "2021", x = 79, y = 0,
alpha = 0.7, vjust = 3.8, size = 10) +
labs(title = "Duck, geese & swan observations in Melbourne (2020-21 vs. 2017-19)") +
theme_pilot(grid = "",
axes = "bl") +
theme(axis.title.x = element_text(size = 23, vjust = -1.3),
axis.title.y = element_text(size = 23),
axis.text.x = element_text(size = 20),
axis.text.y = element_text(size = 20),
axis.line = element_line(linewidth = 0.5),
legend.text = element_text(size = 23),
legend.title = element_text(size = 20),
plot.caption = element_text(size = 18),
text = element_text(family = "hind"),
plot.title = element_markdown(family = "mont", size = 31),
plot.subtitle = element_text(family = "mont", size = 28))Now that we have our final plot, we can see a few interesting trends:
- In the first lockdown (soon after COVID-19 arrived in Australia), species observations were lower than the 2017-2019 average.
- In the 2 longest lockdowns, species observations were higher than the 2017-2019 average.
- In the last half of all lockdowns except the first lockdown, observations increased.
Are these trends that you expected to see?
It’s impossible to make any claims of why these trends emerged from our data visualisation alone, but we can speculate (for fun).
Were people making fewer observations in the first lockdown because they were preoccupied with all the other priorities of settling into new working-from-home routines? Did people make more observations at the tail end of lockdowns because they were seeking relief from being inside by spending more time by natural ponds and lakes?
Some evidence from the US found more people were using natural spaces during COVID-19, and time in these spaces lowered depression and anxiety. A New Zealand study found similar results.
Final thoughts
This post has detailed you can use ALA data to explore relationships between record counts and events. Although we can’t make any causal claims from what we see in our plot, it’s a nice way to do some exploratory analysis of a lot of data!
Expand for session info
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.2.2 (2022-10-31 ucrt)
os Windows 10 x64 (build 19044)
system x86_64, mingw32
ui RTerm
language (EN)
collate English_Australia.utf8
ctype English_Australia.utf8
tz Australia/Sydney
date 2023-03-17
pandoc 2.19.2 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
dplyr * 1.1.0 2023-01-29 [1] CRAN (R 4.2.2)
forcats * 1.0.0 2023-01-29 [1] CRAN (R 4.2.2)
galah * 1.5.1 2023-02-21 [1] Github (AtlasOfLivingAustralia/galah@bd43dd2)
ggplot2 * 3.4.1 2023-02-10 [1] CRAN (R 4.2.2)
ggtext * 0.1.2 2022-09-16 [1] CRAN (R 4.2.2)
htmltools * 0.5.4 2022-12-07 [1] CRAN (R 4.2.2)
lubridate * 1.9.2 2023-02-10 [1] CRAN (R 4.2.2)
pilot * 4.0.0 2022-07-13 [1] Github (olihawkins/pilot@f08cc16)
purrr * 1.0.1 2023-01-10 [1] CRAN (R 4.2.2)
readr * 2.1.4 2023-02-10 [1] CRAN (R 4.2.2)
sessioninfo * 1.2.2 2021-12-06 [1] CRAN (R 4.2.1)
showtext * 0.9-5 2022-02-09 [1] CRAN (R 4.2.1)
showtextdb * 3.0 2020-06-04 [1] CRAN (R 4.2.1)
stringr * 1.5.0 2022-12-02 [1] CRAN (R 4.2.2)
sysfonts * 0.8.8 2022-03-13 [1] CRAN (R 4.2.1)
tibble * 3.1.8 2022-07-22 [1] CRAN (R 4.2.1)
tidyr * 1.3.0 2023-01-24 [1] CRAN (R 4.2.2)
tidyverse * 2.0.0 2023-02-22 [1] CRAN (R 4.2.2)
[1] C:/Users/KEL329/R-packages
[2] C:/Users/KEL329/AppData/Local/Programs/R/R-4.2.2/library
──────────────────────────────────────────────────────────────────────────────